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Abstract 

Metaheuristic algorithms are becoming an important part of modern optimization. 
A wide range of metaheuristic algorithms have emerged over the last two decades, and 
many metaheuristics such as particle swarm optimization are becoming increasingly 
popular. Despite their popularity, mathematical analysis of these algorithms lacks be- 
hind. Convergence analysis still remains unsolved for the majority of metaheuristic 
algorithms, while efficiency analysis is equally challenging. In this paper, we intend to 
provide an overview of convergence and efficiency studies of metaheuristics, and try to 
provide a framework for analyzing metaheuristics in terms of convergence and efficiency. 
This can form a basis for analyzing other algorithms. We also outline some open ques- 
tions as further research topics. 
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1 Introduction 

Optimization is an important subject with many important application, and algorithms 
for optimization are diverse with a wide range of successful applications [10, 11]. Among 
these optimization algorithms, modern metaheuristics are becoming increasingly popular, 
leading to a new branch of optimization, called metaheuristic optimization. Most meta- 
heuristic algorithms are nature-inspired [8, 29, 32], from simulated annealing [20] to ant 
colony optimization [8], and from particle swarm optimization [17] to cuckoo search [35]. 
Since the appearance of swarm intelligence algorithms such as PSO in the 1990s, more than 
a dozen new metaheuristic algorithms have been developed and these algorithms have been 



applied to almost all areas of optimization, design, scheduling and planning, data mining, 
machine intelligence, and many others. Thousands of research papers and dozens of books 
have been published [8, 9, 11, 19, 29, 32, 33]. 

Despite the rapid development of metaheuristics, their mathematical analysis remains 
partly unsolved, and many open problems need urgent attention. This difficulty is largely 
due to the fact the interaction of various components in metaheuristic algorithms are highly 
nonlinear, complex, and stochastic. Studies have attempted to carry out convergence anal- 
ysis [1, 22], and some important results concerning PSO were obtained [7]. However, for 
other metaheuristics such as firefly algorithms and ant colony optimization, it remains an 
active, challenging topic. On the other hand, even we have not proved or cannot prove their 
convergence, we still can compare the performance of various algorithms. This has indeed 
formed a majority of current research in algorithm development in the research community 
of optimization and machine intelligence [9, 29, 33]. 

In combinatorial optimization, many important developments exist on complexity anal- 
ysis, run time and convergence analysis [25, 22]. For continuous optimization, no-free-lunch- 
theorems do not hold [1, 2]. As a relatively young field, many open problems still remain 
in the field of randomized search heuristics [1]. In practice, most assume that metaheuris- 
tic algorithms tend to be less complex for implementation, and in many cases, problem 
sizes are not directly linked with the algorithm complexity. However, metaheuristics can 
often solve very tough NP-hard optimization, while our understanding of the efficiency and 
convergence of metaheuristics lacks far behind. 

Apart from the complex interactions among multiple search agents (making the math- 
ematical analysis intractable), another important issue is the various randomization tech- 
niques used for modern metaheuristics, from simple randomization such as uniform distri- 
bution to random walks, and to more elaborate Levy flights [5, 24, 33]. There is no unified 
approach to analyze these mathematically. In this paper, we intend to review the conver- 
gence of two metaheuristic algorithms including simulated annealing and PSO, followed by 
the new convergence analysis of the firefly algorithm. Then, we try to formulate a framework 
for algorithm analysis in terms of Markov chain Monte Carlo. We also try to analyze the 
mathematical and statistical foundations for randomization techniques from simple random 
walks to Levy flights. Finally, we will discuss some of important open questions as further 
research topics. 

2 Convergence Analysis of Metaheuristics 

The formulation and numerical studies of various metaheuristics have been the main focus 
of most research studies. Many successful applications have demonstrated the efficiency of 
metaheuristics in various context, either through comparison with other algorithms and/or 



applications to well-known problems. In contrast, the mathematical analysis lacks behind, 
and convergence analysis has been carried out for only a minority few algorithms such as 
simulated annealing and particle swarm optimization [7, 22]. The main approach is often 
for very simplified systems using dynamical theory and other ad hoc approaches. Here in 
this section, we first review the simulated annealing and its convergence, and we move onto 
the population-based algorithms such as PSO. We then take the recently developed firefly 
algorithm as a further example to carry out its convergence analysis. 

2.1 Simulated Annealing 

Simulated annealing (SA) is one of the widely used metaheuristics, and is also one of the 
most studies in terms of convergence analysis [4, 20]. The essence of simulated annealing 
is a trajectory-based random walk of a single agent, starting from an initial guess xq. The 
next move only depends on the current state or location and the acceptance probability p. 
This is essentially a Markov chain whose transition probability from the current state to 
the next state is given by 

p = exp 



(D 



where ks is Boltzmann's constant, and T is the temperature. Here the energy change 
AE can be linked with the change of objective values. A few studies on the convergence 
of simulated annealing have paved the way for analysis for all simulated annealing-based 
algorithms [4, 15, 27]. Bertsimas and Tsitsiklis provided an excellent review of the conver- 
gence of SA under various assumptions [4, 15]. By using the assumptions that SA forms 
an inhomogeneous Markov chain with finite states, they proved a probabilistic convergence 
function P, rather than almost sure convergence, that 

maxP[xi(t)€S*x ]>—, (2) 

where S* is the optimal set, and A and a are positive constants [4]. This is for the cooling 
schedule T{t) = d/ln(t), where t is the iteration counter or pseudo time. These studies 
largely used Markov chains as the main tool. We will come back later to a more general 
framework of Markov chain Monte Carlo (MCMC) in this paper [12, 14]. 

2.2 PSO and Convergence 

Particle swarm optimization (PSO) was developed by Kennedy and Eberhart in 1995 [17, 
18], based on the swarm behaviour such as fish and bird schooling in nature. Since then, PSO 
has generated much wider interests, and forms an exciting, ever-expanding research subject, 
called swarm intelligence. PSO has been applied to almost every area in optimization, 
computational intelligence, and design/scheduling applications. 



The movement of a swarming particle consists of two major components: a stochastic 
component and a deterministic component. Each particle is attracted toward the position 
of the current global best g* and its own best location x* in history, while at the same time 
it has a tendency to move randomly. 

Let Xi and Vi be the position vector and velocity for particle i, respectively. The new 
velocity and location updating formulas are determined by 

v\ +1 = v t i + ae 1 \g*-x% + f3e 2 [x* - xj] . (3) 

x\^ = x\^vf\ (4) 

where e\ and e 2 are two random vectors, and each entry taking the values between and 1. 
The parameters a and j3 are the learning parameters or acceleration constants, which can 
typically be taken as, say, a ~ (3 « 2. 

There are at least two dozen PSO variants which extend the standard PSO algorithm, 
and the most noticeable improvement is probably to use inertia function 0(t) so that v\ is 
replaced by 9{t)v\ where 9 G [0, 1] [6]. This is equivalent to introducing a virtual mass to 
stabilize the motion of the particles, and thus the algorithm is expected to converge more 
quickly. 

The first convergence analysis of PSO was carried out by Clerc and Kennedy in 2002 [7] 
using the theory of dynamical systems. Mathematically, if we ignore the random factors, 
we can view the system formed by (3) and (4) as a dynamical system. If we focus on a 
single particle i and imagine that there is only one particle in this system, then the global 
best g* is the same as its current best x*. In this case, we have 

vl +1 = vl + 1 (g*-xl), 7 = « + (5) 

and 

x\^ = x\ + vf\ (6) 

Considering the ID dynamical system for particle swarm optimization, we can replace g* 
by a parameter constant p so that we can see if or not the particle of interest will converge 
towards p. By setting Ut = p — x{t + 1) and using the notations for dynamical systems, we 
have a simple dynamical system 

v t+ i = v t + <yu t , u t+ i = -v t + (1 - 7K, (7) 

or 

The general solution of this dynamical system can be written as It = YoexpL4t]. The 
system behaviour can be characterized by the eigenvalues A of A, and we have Ai 5 2 = 



1 - 7/2 ± vV ~ 47/2. It can be seen clearly that 7 = 4 leads to a bifurcation. Following 
a straightforward analysis of this dynamical system, we can have three cases. For < 
7 < 4, cyclic and/or quasi-cyclic trajectories exist. In this case, when randomness is 
gradually reduced, some convergence can be observed. For 7 > 4, non-cyclic behaviour 
can be expected and the distance from Y t to the center (0, 0) is monotonically increasing 
with t. In a special case 7 = 4, some convergence behaviour can be observed. For detailed 
analysis, please refer to Clerc and Kennedy [7]. Since p is linked with the global best, as 
the iterations continue, it can be expected that all particles will aggregate towards the the 
global best. 

2.3 Firefly algorithm, Convergence and Chaos 

Firefly Algorithm (FA) was developed by Yang [32, 34], which was based on the flashing 
patterns and behaviour of fireflies. In essence, each firefly will be attracted to brighter 
ones, while at the same time, it explores and searches for prey randomly. In addition, the 
brightness of a firefly is determined by the landscape of the objective function. 

The movement of a firefly i is attracted to another more attractive (brighter) firefly j is 
determined by 

x\ +1 = x\ + (3 e-^ {x) -xj) + aet (9) 

where the second term is due to the attraction. The third term is randomization with a 
being the randomization parameter, and e\ is a vector of random numbers drawn from a 
Gaussian distribution or other distributions. Obviously, for a given firefly, there are often 
many more attractive fireflies, then we can either go through all of them via a loop or use the 
most attractive one. For multiple modal problems, using a loop while moving toward each 
brighter one is usually more effective, though this will lead to a slight increase of algorithm 
complexity. 

Here is /3o £ [0,1] is the attractiveness at r = 0, and = ||jCj — Xj\\2 is the £2- 
norm or Cartesian distance. For other problems such as scheduling, any measure that can 
effectively characterize the quantities of interest in the optimization problem can be used 
as the 'distance' r. 

For most implementations, we can take (3o = 1, a = 0(1) and 7 = 0(1). It is worth 
pointing out that (9) is essentially a random walk biased towards the brighter fireflies. If 
Po = 0, it becomes a simple random walk. Furthermore, the randomization term can easily 
be extended to other distributions such as Levy flights [16, 24]. 

We now can carry out the convergence analysis for the firefly algorithm in a frame- 
work similar to Clerc and Kennedy's dynamical analysis. For simplicity, we start from the 
equation for firefly motion without the randomness term 

xf x = x\ + ^e-^%(x)-x^. (10) 




Figure 1: The chaotic map of the iteration formula (13) in the firefly algorithm and the 
transition between from periodic/multiple states to chaos. 

If we focus on a single agent, we can replace x*- by the global best g found so far, and we 
have 

xl+ l =xj + (3 e-^(g-xl), (11) 

where the distance rj can be given by the ^-norm rf = \ \g — as* 1 1§ - In an even simpler 1-D 
case, we can set y t = g — x\, and we have 

Vt+i = yt -Poe~^y t . (12) 

We can see that 7 is a scaling parameter which only affects the scales/size of the firefly 
movement. In fact, we can let ut = ^Fjyt and we have 

ut+i =ut[l-p e' u h (13) 

These equations can be analyzed easily using the same methodology for studying the well- 
known logistic map 

Ut+i = Xu t (l - u t ). (14) 

The chaotic map is shown in Fig. 1, and the focus on the transition from periodic multiple 
states to chaotic behaviour is shown in the same figure. 

As we can see from Fig. 1 that convergence can be achieved for (3q < 2. There is 
a transition from periodic to chaos at (3q ~ 4. This may be surprising, as the aim of 
designing a metaheuristic algorithm is to try to find the optimal solution efficiently and 
accurately. However, chaotic behaviour is not necessarily a nuisance; in fact, we can use it 
to the advantage of the firefly algorithm. Simple chaotic characteristics from (14) can often 
be used as an efficient mixing technique for generating diverse solutions. Statistically, the 
logistic mapping (14) with A = 4 for the initial states in (0,1) corresponds a beta distribution 

B(u,p,q) = Ek+A^-l(i - u f-\ (15) 



when p = q = 1/2. Here T(z) is the Gamma function 

r(z) = r V _1 e-*dt. (16) 
Jo 

In the case when z = n is an integer, we have T(n) = (n — 1)!. In addition, T(l/2) = y^. 
From the algorithm implementation point of view, we can use higher attractiveness /?o 
during the early stage of iterations so that the fireflies can explore, even chaotically, the 
search space more effectively. As the search continues and convergence approaches, we 
can reduce the attractiveness /3q gradually, which may increase the overall efficiency of the 
algorithm. Obviously, more studies are highly needed to confirm this. 

2.4 Markov Chain Monte Carlo 

From the above convergence analysis, we know that there is no mathematical framework in 
general to provide insights into the working mechanisms, the stability and convergence of a 
give algorithm. Despite the increasing popularity of metaheuristics, mathematical analysis 
remains fragmental, and many open problems need urgent attention. 

Monte Carlo methods have been applied in many applications [28], including almost 
all areas of sciences and engineering. For example, Monte Carlo methods are widely used 
in uncertainty and sensitivity analysis [21]. From the statistical point of view, most meta- 
heuristic algorithms can be viewed in the framework of Markov chains [14, 28]. For example, 
simulated annealing [20] is a Markov chain, as the next state or new solution in SA only 
depends on the current state/solution and the transition probability. For a given Markov 
chain with certain ergodicity, a stability probability distribution and convergence can be 
achieved. 

Now if look at the PSO closely using the framework of Markov chain Monte Carlo 
[12, 13, 14], each particle in PSO essentially forms a Markov chain, though this Markov 
chain is biased towards to the current best, as the transition probability often leads to the 
acceptance of the move towards the current global best. Other population-based algorithms 
can also be viewed in this framework. In essence, all metaheuristic algorithms with piece- 
wise, interacting paths can be analyzed in the general framework of Markov chain Monte 
Carlo. The main challenge is to realize this and to use the appropriate Markov chain theory 
to study metaheuristic algorithms. More fruitful studies will surely emerge in the future. 

3 Search Efficiency and Randomization 

Metaheuristics can be considered as an efficient way to produce acceptable solutions by 
trial and error to a complex problem in a reasonably practical time. The complexity of the 
problem of interest makes it impossible to search every possible solution or combination, the 
aim is to find good feasible solutions in an acceptable timescale. There is no guarantee that 



the best solutions can be found, and we even do not know whether an algorithm will work 
and why if it does work. The idea is to have an efficient but practical algorithm that will 
work most the time and is able to produce good quality solutions. Among the found quality 
solutions, it is expected some of them are nearly optimal, though there is no guarantee for 
such optimality. 

The main components of any metaheuristic algorithms are: intensification and diversi- 
fication, or exploitation and exploration [5, 33]. Diversification means to generate diverse 
solutions so as to explore the search space on the global scale, while intensification means 
to focus on the search in a local region by exploiting the information that a current good 
solution is found in this region. This is in combination with with the selection of the best 
solutions. 

As discussed earlier, an important component in swarm intelligence and modern meta- 
heuristics is randomization, which enables an algorithm to have the ability to jump out 
of any local optimum so as to search globally. Randomization can also be used for local 
search around the current best if steps are limited to a local region. Fine-tuning the ran- 
domness and balance of local search and global search is crucially important in controlling 
the performance of any metaheuristic algorithm. 

Randomization techniques can be a very simple method using uniform distributions, or 
more complex methods as those used in Monte Carlo simulations [28]. They can also be 
more elaborate, from Brownian random walks to Levy flights. 

3.1 Gaussian Random Walks 

A random walk is a random process which consists of taking a series of consecutive random 
steps. Mathematically speaking, let un denotes the sum of each consecutive random step 
Si, then un forms a random walk 



where Sj is a random step drawn from a random distribution. This suggests that the next 
state un will only depend the current existing state u^-i and the motion or transition un 
from the existing state to the next state. In theory, as the number of steps ./V increases, 
the central limit theorem implies that the random walk (17) should approaches a Gaussian 
distribution. In addition, there is no reason why each step length should be fixed. In fact, 
the step size can also vary according to a known distribution. If the step length obeys the 
Gaussian distribution, the random walk becomes the standard Brownian motion [16, 33]. 

From metaheuristic point of view, all paths of search agents form a random walk, in- 
cluding a particle's trajectory in simulated annealing, a zig-zag path of a particle in PSO, 
or the piecewise path of a firefly in FA. The only difference is that transition probabilities 
are different, and change with time and locations. 




(17) 



i=i 



Under simplest assumptions, we know that a Gaussian distribution is stable. For a 
particle starts with an initial location xq, its final location x^ after N time steps is 

N 

x N = x + ^2aiS i} (18) 
i=i 

where on > is a parameters controlling the step sizes or scalings. If Sj is drawn from a 
normal distribution iV(/ij, of), then the conditions of stable distributions lead to a combined 
Gaussian distribution 

N N 

x N ~N(n*,al), //* = ^o^, of = ^ a^of + 0* - //j) 2 ]. (19) 

i=l i=l 

We can see that that the mean location changes with N and the variances increases as N 
increases, this makes it possible to reach any areas in the search space if N is large enough. 

A diffusion process can be viewed as a series of Brownian motion, and the motion obeys 
the Gaussian distribution. For this reason, standard diffusion is often referred to as the 
Gaussian diffusion. As the mean of particle locations is obviously zero if //j = 0, their 
variance will increase linearly with t. In general, in the d-dimensional space, the variance 
of Brownian random walks can be written as 

a 2 (t) = \v \ 2 t 2 + (2dD)t, (20) 

where vq is the drift velocity of the system. Here D = s 2 /(2r) is the effective diffusion 
coefficient which is related to the step length s over a short time interval r during each 
jump. If the motion at each step is not Gaussian, then the diffusion is called non-Gaussian 
diffusion. If the step length obeys other distribution, we have to deal with more generalized 
random walks. A very special case is when the step length obeys the Levy distribution, 
such a random walk is called Levy flight or Levy walk. 

3.2 Randomization via Levy Flights 

In nature, animals search for food in a random or quasi-random manner. In general, the 
foraging path of an animal is effectively a random walk because the next move is based on the 
current location/state and the transition probability to the next location. Which direction it 
chooses depends implicitly on a probability which can be modelled mathematically [3, 24]. 
For example, various studies have shown that the flight behaviour of many animals and 
insects has demonstrated the typical characteristics of Levy flights [24, 26]. Subsequently, 
such behaviour has been applied to optimization and optimal search, and preliminary results 
show its promising capability [30, 24]. 

In general, Levy distribution is stable, and can be defined in terms of a characteristic 
function or the following Fourier transform 

F(k) = exp[-a\kf], < /3 < 2, (21) 



where a is a scale parameter. The inverse of this integral is not easy, as it does not have 
nay analytical form, except for a few special cases [16, 23]. For the case of /3 = 2, we have 
F(k) = exp[— ak 2 ], whose inverse Fourier transform corresponds to a Gaussian distribution. 
Another special case is j3 = 1, which corresponds to a Cauchy distribution 
For the general case, the inverse integral 

l r°° 

L(s) = — cos(ks) exp[-a\k\P]dk, (22) 
vr Jo 

can be estimated only when s is large. We have 

a/?r(/3)sin(7r/?/2) , , 
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Levy flights are more efficient than Brownian random walks in exploring unknown, large- 
scale search space. There are many reasons to explain this efficiency, and one of them is 
due to the fact that the variance of Levy flights takes the following form 

a 2 (t) ~ t 3_/3 , l</3<2, (24) 

which increases much faster than the linear relationship (i.e., cr 2 (i) ~ t) of Brownian random 
walks. 

Studies show that Levy flights can maximize the efficiency of resource searches in un- 
certain environments. In fact, Levy flights have been observed among foraging patterns of 
albatrosses and fruit flies [24, 26, 30]. In addition, Levy flights have many applications. 
Many physical phenomena such as the diffusion of fluorenscent molecules, cooling behavior 
and noise could show Levy-flight characteristics under the right conditions [26] . 



4 Open Problems 

It is no exaggeration to say that metahueristic algorithms have been a great success in 
solving various tough optimization problems. Despite this huge success, there are many 
important questions which remain unanswered. We know how these heuristic algorithms 
work, and we also partly understand why these algorithms work. However, it is difficult to 
analyze mathematically why these algorithms are so successful, though significant progress 
has been made in the last few years [1, 22]. However, many open problems still remain. 

For all population-based metaheuristics, multiple search agents form multiple interact- 
ing Markov chains. At the moment, theoretical development in these areas are still at 
early stage. Therefore, the mathematical analysis concerning of the rate of convergence is 
very difficult, if not impossible. Apart from the mathematical analysis on a limited few 
metaheuristics, convergence of all other algorithms has not been proved mathematically, 
at least up to now. Any mathematical analysis will thus provide important insight into 
these algorithms. It will also be valuable for providing new directions for further important 



modifications on these algorithms or even pointing out innovative ways of developing new 
algorithms. 

For almost all metaheuristics including future new algorithms, an important issue to 
be addresses is to provide a balanced trade-off between local intensification and global 
diversification [5]. At present, different algorithm uses different techniques and mechanism 
with various parameters to control this, they are far from optimal. Important questions 
are: Is there any optimal way to achieve this balance? If yes, how? If not, what is the best 
we can achieve? 

Furthermore, it is still only partly understood why different components of heuristics 
and metaheuristics interact in a coherent and balanced way so that they produce efficient 
algorithms which converge under the given conditions. For example, why does a balanced 
combination of randomization and a deterministic component lead to a much more efficient 
algorithm (than a purely deterministic and/or a purely random algorithm)? How to measure 
or test if a balance is reached? How to prove that the use of memory can significantly 
increase the search efficiency of an algorithm? Under what conditions? 

In addition, from the well-known No- Free-Lunch theorems [31], we know that they have 
been proved for single objective optimization for finite search domains, but they do not hold 
for continuous infinite domains [1, 2]. In addition, they remain unproved for multiobjective 
optimization. If they are proved to be true (or not) for multiobjective optimization, what 
are the implications for algorithm development? Another important question is about 
the performance comparison. At the moment, there is no agreed measure for comparing 
performance of different algorithms, though the absolute objective value and the number of 
function evaluations are two widely used measures. However, a formal theoretical analysis 
is yet to be developed. 

Nature provides almost unlimited ways for problem-solving. If we can observe carefully, 
we are surely inspired to develop more powerful and efficient new generation algorithms. 
Intelligence is a product of biological evolution in nature. Ultimately some intelligent algo- 
rithms (or systems) may appear in the future, so that they can evolve and optimally adapt 
to solve NP-hard optimization problems efficiently and intelligently. 

Finally, a current trend is to use simplified metaheuristic algorithms to deal with complex 
optimization problems. Possibly, there is a need to develop more complex metaheuristic 
algorithms which can truly mimic the exact working mechanism of some natural or biological 
systems, leading to more powerful next-generation, self-regulating, self-evolving, and truly 
intelligent metaheuristics. 
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